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Most of the macromolecular structures in the Protein Data Bank (PDB), which 
are used daily by thousands of educators and scientists alike, are determined by 
X-ray crystallography. It was examined whether the crystallographic models and 
data were deposited to the PDB at the same time as the publications that 
describe them were submitted for peer review. This condition is necessary to 
ensure pre-publication validation and the quality of the PDB public archive. It 
was found that a significant proportion of PDB entries were submitted to the 
PDB after peer review of the corresponding publication started, and many were 
only submitted after peer review had ended. It is argued that clear description of 
journal policies and effective policing is important for pre-publication 
validation, which is key in ensuring the quality of the PDB and of peer- 
reviewed literature. 
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Since the mid-1990s, peer-reviewed journals and the crystallographic 
community have worked towards the notion that crystallographic 
models and the associated diffraction data should be submitted to the 
Protein Data Bank (Baker et al, 1996) and publicly released upon 
publication (Wlodawer et al, 1998; Editorial, 1998; Baker & Saenger, 
1999). This is nowadays the norm, and deviations from that rule are 
rare. As much as 99.8% of crystallographic structures submitted to 
the PDB within 2011-2013 make available both the model and the 
experimental data. This also enables critical re-evaluation of 
submitted models, based on the original diffraction data but in the 
light of improved methods and software (Joosten et al, 2009). 
However, the time frame for data submission has been less well 
defined: should data be available in one of the wwPDB (Berman et 
al, 2003) sites before the paper is submitted, before it is accepted for 
publication, or merely after the paper is accepted, just before 
publication? 

Recently, a Validation Task Force assigned by the PDB has 
published a recommendation (Read et al, 2011) that the submission 
of papers that report on crystallographic data should be accompanied 
by a validation report issued from the PDB. It is an obvious pre- 
requisite that both the experimental data and the model coordinates 
are submitted to the PDB before paper submission, to achieve this. 
Such reports are indispensable tools for technical review of the paper 
by the assigned referees (Read et al, 2011), and crucial for ensuring 
that any claims based on the structure are supported by data of 
appropriate quality. 



2. Materials and methods 

The original data presented in this paper are available in public 
databases (PDB and PubMed); a data digest relevant to our 
conclusions are included as Supplementary Material; 1 and all the 
code and the database as well as minimal instructions to reproduce all 



1 Supplementary material has been deposited in the IUCr electronic archive 
(Reference: DZ5303). Services for accessing this material are described at the 
back of the journal. 
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Table 1 

Numbers and percentages of papers for which the associated PDB entries were submitted after the submission date 
or after the acceptance or publication date, per journal and associated journal impact factors (IF), for journals for 
which data were available for more than 100 structures for the period between 2000 and 2012. 
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Proteins 
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Cell 
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185 
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84 
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3.6 


BMC Struct. Biol 
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142 
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11 
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Nature Commun. 


119 


119 


59 


50 


50 
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Figure 1 

Deposition dates of structures during the different editorial phases of the corresponding manuscript. Red columns 
show the percentage of structures that were deposited after the manuscript was accepted (or after it was published if 
acceptance dates were not available) and blue columns show the percentage of structures deposited after the 
manuscript was submitted for review but before it was accepted/published. The lines show the number of 
manuscripts for which the appropriate editorial history was available for each of these categories. Note that before 
2000 insufficient data were available on manuscript submission dates. 



the results have been uploaded to 
GitHub, at the repository https:// 
github.com/massyah/PdbMine. 

Briefly, the identifier of PDB records 
with associated 'Primary citation' were 
retrieved from the RCSB webserver on 28 
June 20f3 at 15:25 GMT+1 (91 738 
unique IDs). The corresponding PDB 
entries were downloaded from the 
ftp.wwpdb.org FTP server, parsed, and 
the PDB fields relevant for this study 
(namely PDB ID, date of deposition, 
associated PubMed ID) were stored in a 
SQLITE3 database. The PubMed entries 
of all associated citations were down- 
loaded from the PubMed web server 
using the EUTILS suite and then parsed 
and stored in the SQLITE3 database. 
From the PubMed associated MEDLINE 
records, we extracted (if available) the 
following dates: received, revised, 
accepted and ahead of print date 
from the publication history (PHST) field; 
date of publication (DP); date created 
(DA); PubMed central release date 
(PMCR); date of electronic publication 
(DEP) and Entrez Date (EDAT). The 
'earliest public date' is then defined 
as the earliest of the PubMed dates; 
while the 'earliest publication date' is 
defined as the earliest of the DP, EDAT, 
DA, DEP and the 'ahead of print', 
'accepted' dates from the PHST. We 
then considered for this analysis the 
inner join of the PDB entries table with 
the PubMed table, where we only kept 
entries for which (i) the earliest public 
date was after 1 January 1995; (ii) the 
published date and accepted date were 
before 1 January 2014 or available; and 
(iii) either the publication history was 
available or the received date was earlier 
than the accepted or published date; 
totalling 69 026 unique PDB entries 
joined with 35 924 unique PubMed 
entries. 

All entries were considered to be 'on 
time' by default. We defined as 'deposited 
after acceptance' those entries for which 
the date of deposition with the PDB was 
more than two days after the 'earliest 
publication date'. We identified as 
'deposited after submission' those entries 
that were not 'deposited after acceptance' 
but for which deposition with the PDB 
was more than two days after the 'earliest 
public date'. The impact-factor estimates 
used to build Table 1 originate from 
the Thomsom Reuters Journal Citation 
Reports Science Edition 2011 (http:// 
thomsonreuters.com/journal-citation- 
reports/). 
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3. Results and discussion 

3.1. Correlating the dates of crystallographic structure and data 
submission to the PDB and of manuscript submission for peer review 

The results from the analysis of the PDB deposition date against 
the submission and acceptance dates were manually curated to select 
journals with at least 100 publications that referred to PDB entries 
over the last 12 years, and are presented in Table 1. The number of 
structures submitted to the PDB only after the paper was accepted 
for publication has historically been rather low (less than 10% since 
1999) and has been minimized over the years, being just 3.4% (205 of 
6003 papers) in 2012 (Fig. 1). However, the number of structures 
submitted to the PDB after the paper has been submitted for review 
is, somewhat surprisingly, high. Although tracing the submission date 
is not possible for all publications, we were able to extract that 
information for about 50% of the structures published in 2012, and 
about one third of them were deposited after the paper was submitted 
to the journal for peer review. It is also noteworthy, that a quarter of 
the depositions in the window between manuscript submission and 
manuscript acceptance occurred just within the last six days before 
manuscript acceptance (Supplementary Fig. SI). It is unlikely that 
referees had access to PDB validation reports in that time window, 
and more likely that formal acceptance of the manuscript was post- 
poned until the structure was deposited. 

3.2. Confidentiality versus transparency issues 

Many authors are worried that submission of a structure to the 
PDB will trigger competitors to accelerate their own paper submis- 
sion. This is a legitimate concern, and having been at the receiving 
end of this practice, this is not a pleasant experience. However, this 
concern is ameliorated by an existing submission-time option where 
the sequences corresponding to the submitted structures are not 
made publically available before the entry is finally released. The 
possibility of not directly disclosing the sequence is popular: it is 
currently used by about two thirds of entries awaiting release. A 
submission-time option to also withhold the title, currently only 
possible upon request, would undoubtedly prove equally popular and 
could help removing remaining concerns. 

3.3. Some journals are more equal than others 

Urban legend has it that high-impact journals are notorious for 
tolerating late submission as they typically publish 'hot' structures, 
which many research groups are competing to be the first to deter- 
mine: to paraphrase a well known quotation (Orwell, 1945), all 
journals are equal, but some journals are more equal than others. 
Indeed, we find that journals with a high impact factor for which we 
could trace the full publication history (the list most regrettably does 



not include important journals like Science, Proc. Natl Acad. Sci. USA 
and J. Biol. Chem., which do not make the complete publication 
history available in the PubMed/MEDLINE records) are more likely 
to tolerate late submission of crystallographic data (Supplementary 
Fig. S2). A notable exception to this rule is Acta Crystallographica 
Section D, which traditionally had a significantly lower impact factor 
(between 1 and 3) and has only shot to impact-factor prominence 
over the last couple of years (mainly owing to the publication of 
highly cited methodological papers). One of the best performing 
journals in recent years is Proteins, which unsurprisingly has a simple, 
clear and short policy statement in the instruction for authors: 'For all 
crystallographic studies, coordinates and structure factors should be 
deposited in the Protein Data Bank at the time of manuscript 
submission'. This policy, unlike others (a survey of the policies of 
different journals is available as Supplementary Table SI) is explicit 
about the timing of deposition. Clarity about policies is crucial, but 
ensuring that the policies are honored is key. 

4. Conclusion 

As we are confident that all journals strive for transparency in the 
publication procedure and for rigor in the reported results, we 
strongly advocate that the editorial teams improve the clarity of their 
policies, and enforce these effectively. The structural biologists, 
authors and reviewers alike, should also share the responsibility for 
following these policies. As a community we must strive to ensure 
that coordinates and experimental data for macromolecular models 
are submitted to the PDB at the same time as the paper is submitted 
for review. Only then will validation reports also become available to 
the referees as part of the necessary material for peer review. 

RPJ is supported by a Veni grant 722.011.011 from the Netherlands 
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The Worldwide Protein Data Bank (wwPDB) strongly agrees with 
the overall views expressed by Joosten et al. (2013) in their article 
about timely deposition of macromolecular structures in the Protein 
Data Bank. In 2010, Acta Crystallographica Section D began to 
require validation reports as part of the manuscript-submission 
process. In that same year, the wwPDB sent letters to the key journals 
that publish structures requesting that they require authors to submit 
wwPDB validation reports at the same time as their manuscripts. In 
this way reviewers are able to better evaluate the work. The Journal 
of Biological Chemistry, which is currently the journal that publishes 
the largest number of papers per year about structures of biological 
macromolecules, began requiring these reports in 2012. 

Joosten et al. suggest that it would be helpful to have an option to 
suppress entry titles at the time of submission to the PDB until the 
structure is released. Policy matters such as this are regularly 
reviewed by the wwPDB partners and its Advisory Committee 
(wwPDB AC). The issue was discussed at our 2013 meeting, and it 
was agreed that we will make this option available in the new wwPDB 
Deposition Tool that will be launched early in 2014. 

References 



Joosten, R. P., Soueidan, H., Wessels, L. F. A. & Perrakis, A. (2013). Acta Cryst. 
D69, 2293-2295. 



2296 doi:10.1107/S090744491 30291 68 



Acta Cryst. (2013). D69, 2296 



